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The following problems in the field of essay 9 r ^19 1 ^f v fo , SS?der and C5) 
reliability of grading. <2> deciding upon elements or v ari j& lesto ~n»dM%anfl W 

deciding on the weight to assign to each variable. Desqibed are two • 

study on essay grading at the high school level in Alberta. Can^a. (U then^hty 
of scoring procedures, and (2) the effectiveness of the procedures wdh resp«:tto 
educational objectives. The grade 12 essay examination was written! by r liOOT 

students. It was scored by 48 readers according to *^W £*** ffiftiS 

variables grouped according to grammar and content. The f<^tewr^^atisw» 

procedures were performed: (1) correlations of scores g^n by reach ' 

factor analysis on the variables to determine what undertyii^ elements were present 

(3) estimations of reliability of scoring by use of corretation mea ns- and Wan 

analysis of variance on scores. The effectiveness of Ihe 

Variables were then grouped according to six factors, and tables 

and factor descriptions were developed. Style-content v^aW« i^efoundto be 

very heavily weighted and as this was against the intention of the author, another 

study, employing a different weighting technique is planned. wS) 
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THE RELIABILITY OF ESSAY GRADING* 



Despite efforts of researchers, three problems in the field of essay 
grading have persisted. These problems are: 

1) low reliability of grading 

2) deciding upon the elements or variables to consider while scoring 

j 

3) deciding on what weight should be given to each of the variables j 

Earlier studies such as those reported by Starch and Elliot (1912), 

Darsle (1922), end Hulten (1925) were concerned almost entirely with low rella- i 

ability of grading. Later studies dealt with one aspect of the problem of what j 

variables should be employed in grading essays, and, in particular, whether j 

"wholistlc" or "atomistic" approaches should be used. The underlying purpose in 1 

such studies, however, was usually to find a way of improving reliability of | 

grading. Studies in this area were reported by Cast (1939; 1940), Morrison and ] 

Vernon (1941), Coward (1952), Torgerson and Green (1952), Huddleston (1954), 1 

Remondlno (1959), Dlederlch, French and Carleton (1961), and Coffman and Kurfman 1 

(1968). 

The purpose of this paper is to describe two aspects of a study conducted j 

in the broad area of essay grading at the high school leaving level in the Province j 

of Alberta. One aspect is the reliability of the scoring procedures, the other is 
the effectiveness of the procedures with respect to the educational objectives. 

The branches of the various provincial departments of education charged with the 
task of grading examinations are well aware of the problem of unreliability of 
scoring and most of them have devised special procedures for improving the 
reliability. The study reported here might therefore be of Interest to a number 
of education departments. 

The Department of Education in Alberta administers a battery of achieve- 
ment examinations for Grade XII students. Among the examinations is a two-hour 
test that involves writing an original essay. The essays are scored by a committee 
of Grade XII English teachers selected by the department. The study reported here 
centered about the scoring of these essays. 



In June of 1964 approximately 13,000 students wrote the Grade XII essay 
examination. Students were given two topics, one of which was to be developed in 
an essay of 300 words or less. The scoring was accomplished by a group of 48 
readers, divided into two groups. One group was responsible for grading mechanics 
of English, the other for grading style and content. Each essay was read twice, 
but at each reading it was scored for different things. 



Special procedures have been adopted in an effort to improve the 
reliability of scoring. Minimum qualifications of training and teaching experience 
for readers have been specified, but, more important , the scoring is done according 
to a fixed pattern employing a total of 22 variables. The essays are identified by 
numbers, only, during the scoring operation, and the readers work under \the super- 
vision of two chairmen, one in charge of the 'mechanics' group, and the other in 
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charge of the 'style-content' group. Tha chief responsibility of 
to sea that tha standard of scoring is as uniform sspo.sihls. tofore thsseoring 
actually begins approximately one half day is spent in discuss 
of the 22 variables. 

The procedure adopted in the study was to select a sample of 103 ••••»•. 
all on one topic, and to have each of the readers score all of ^ 

It was then possible to correlate scores of readers, and ? l#o t ® , 

correlations among the 22 variables. A factor analysis, based on the ma trix ot 
correlations between pairs of variables, was carried out for the 
determining What underlying variables were present. It was pos » * - 

determine how much each variable contributed to the total *** ^ “ y ff 
the stated goal of having half the score being contributed by mechanics 
other half from ' sty le -content • had been achieved. 

An estimate of the reliability of the scoring was ^t*™*”** *** 

matrix of correlations between pairs of readers, by finding the mean of these 
correlations. Por the 'mechanics' group the reliability o£ 

'style-content' group it was .60. A suggestion at this point that the 
the X 'mechanics' group was more satisfactory becomes untenable when on e ex amines 
the means and standard deviations. Since all the readers scored the same 103 
essays the mean scores for the 'mechanics readers should be equal, reflect 

mean scores for the 'style-content* group. Differences in means 
differences in standards among markers. Por the 'mechanics group the meane 
varied from 40.0 to 72.4, and the standard deviations from 19.8 to 28.7. Pot the 
'style-content' group the means ranged from 63.6 to 72.4 and the d#v 

tlons from 9.3 to 13,1. This would Indicate that certain exacting raaaa 
consistently found almost twice as many mechanics errors as certain lenient 

readers. 



An analysis of variance procedure, using one of Winer s mod s, 

shows that the variation in marks from one essay to another accounted for 59* 
the total variation. The remaining 41%, however, was not attributable o 
differences among readers. 

In all, each essay was graded with respect to 22 vari ® b J 6 !* 5® , tl 

effectiveness of these variables was judged first through use °* vsrimax 

procedure. Interpretation of the factors was attempted on the basl ® ® f / .1“^* 
rotation, with loadings in excess of .65, only, being considered. The factor 

loadings are as follows: 

TABLE Is FACTOR LOADINGS OF GRADING VARIABLES 



Variable 



■ » , . ». «».,«♦ 



■ 'mi4twiu>wim 



t 



(0 

O 



§ 

•8 




Spelling 
Punctuation 
Word Usage 
Grammar 

Sentence Errors 
Form 

Significance 

Relevance 

Originality 



Factors 



I _ 


11 


III 


IV 


V 


vi. 








I 


.84 


.67 




.86 

.76 

.76 




.9 






.94 












.90 




1 








.89 
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Variable 


I 


II 


Factors 
III IV 


V 


VI 


A 


\ 


Plan 


! i 

i 


.95 












Relation of plan 


















to essay 






.93 












Introduction 


.84 
















Order 


.92 












fi 


Emphasis 


.95 












V 

4J 


Conclusion 


.86 












§ 


Vividness of words 


.94 












O 

• 


Figures of speech 


.89 












£ 


Vocabulary 


.93 












U 


Sentence structure 


.93 












CO 


Sentence beginnings 


.89 
















Economy 


.91 












V 


Total impression 


.96 













Owing to the number of comparatively high loadings it was not too 
difficult to find reasonable labels for the factors. 

Factor I was called 'general proficiency in style and content' or 
'general impression of style and content'. The variable 'general impression 
had a very high loading, but it was not clear whether this score was awarded 
on the basis of an unconscious totalling of the subscores, or whether, in reading 
an essay, a marker quickly formed a general impression, then made the subscores 
fit his Impression. 

Factor II was called 'higher mechanical skills in writing'. 

Factor 111 was obviously the essay plan. It was noted that this factor 
seemed unrelated to other variables. For the top half of the essay scores the 
variable 'plan' was generally negatively correlated with other variables. One 
possible explanation might be that weak students were drilled so as to complete 
the plan after the essay was completed. It seemed a reasonable conclusion that 
the two variables that made up Factor III be redefined, discarded, or measured in 
another way. 

Factors IV to VI, respectively, contained only one variable each, and 
were labelled, respectively; form, punctuation, and spelling. The variable 
labelled 'form' tended to be negatively correlated with other scores, therefore 
it seemed reasonable to recommend dropping this variable from the list used in 
grading essays. 

A study of the contribution of each essay variable to the total score 
was very revealing. It was mentioned earlier in this paper that a decision had 
been made by the curriculum makers that the 'mechanics' and 'style-content scores 
should be equally weighted. The implementation of this decision had been rather 
naive. The highest possible raw score for the two sections were made equal, on 
the assumption that this would ensure equal weighting. Furthermore, raw scores 
were assigned to each of the variables on the assumption that each of these would 
be weighted in proportion to the value assigned. 

Calculations were made so that comparisons could be made between expected 
u n d actual contributions of each of the variables to the total score. The com- 
parisons are contained in the following figures i 
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TABLE II s STATISTICS RELATED TO ESSAY SCORING VARIABU5S 



VARIABLES 



Max. Moan 

Score Score 

Possible Awarded 



S.D. Contribution to 

of Total Score tti 

Scores Expected Actual 



Mechanics 



Spelling 


39 


Punctuation 


26 


Word Usage 


24 


Grammar 


33 


Sentence Errors 


39 


Form 


14 


Total Mechanics 


175* 


Style-Content 


Significance 


10 


Relevance 


10 


Originality 


5 


Plan 


5 


Relation of plan 
to essay 


5 


Introduction 


5 


Order 


5 


Emphasis 


5 


Conclusion 


5 


Vividness of words 


10 


Figures of speech 


5 


Vocabulary 


5 


Sentence Structure 


15 


Sentence beginnings 


5 


Economy 


15 


Total iiiipression 


15 


Total Style -Content 


125 



20.9 


10.2 


11.1 


27.6 


15.7 


3.3 


7.4 


8.5 


16.3 


2.9 


6.9 


7.7 


24.2 


3.9 


9.5 


12.2 


29.4 


4.5 


11.1 


14.5 


9.7 


2.2 


4.0 


3.3 


116.2 




50.0 


73.8 


6.0 


.6 


4.0 


1.6 


6.3 


.6 


4.0 


1.6 


1.8 


.7 


2.0 


1.4 


3.0 


.6 


2.0 


1.0 


3.3 


.6 


2.0 


1.0 


2.6 


.6 


2.0 


1.2 


2.7 


.4 


2.0 


1.2 


2.2 


.5 


2.0 


1.3 


2.3 


.6 


2.0 


1.3 


5.8 


.8 


4.0 


2.3 


2.4 


.4 


2.0 


1.0 


2.5 


.4 


2.0 


1.1 


8.1 


1.0 


6.0 


3.0 


2.8 


.3 


2.0 


.8 


8.0 


1.0 


6.0 


3.0 


8.0 


1.2 


6.0 


3,4 


67.8 




50.0 


26.2 



*Total scores for mechanics were not permitted to exceed 125« 



The table shows that English mechanics contributed much more than expect 
ed. It was surprising to note that spelling, alone, contributed more than 16 
* style -content * variables combined. This was definitely not the intention of the 
curriculum conmittees in charge of Grade XII English, therefore it must be con- 
cluded that a re-examination of the method and weighting of the variables is In 
order. A re-examination of the actual variables themselves is also indicated, as 
a result of the factor analysis reported earlier. 

A study involving fewer variables, and employing a different weighting 
technique, in the scoring of essays is in progress. 
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